Insilico Performance
| Rank | Model | Score |
|---|---|---|
| 1 | Claude-3.5-Sonnet | 0.654 |
| 2 | GPT-4o | 0.566 |
| 3 | Gemini-1.5-Pro | 0.436 |
| 4 | Llama 3.2 90B Vision | 0.339 |
| 5 | Baseline | 0.138 |
Sub-Task Performance
Performance across individual Sub-Tasks in this domain.
| Rank | Model | Score |
|---|---|---|
| 1 | Claude-3.5-Sonnet | 0.654 |
| 2 | GPT-4o | 0.566 |
| 3 | Gemini-1.5-Pro | 0.436 |
| 4 | Llama 3.2 90B Vision | 0.339 |
| 5 | Baseline | 0.138 |
Performance across individual Sub-Tasks in this domain.